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This addition represents i runner optimiiauon of the sequences, of the number of required oligonucieoude 
prooes ana or the numoer ana the lenijtn ot cloned iracmems or genomic ON A m order to achiev* « compieu a 
sequence as possible. S^id addiuon does not change the essence of the metnod or achieving this or the detemuaauoo 
or sequences or repiicaten rragmenis ot genomic ON A and thus ai»o or the sequences of the enure genomic ONA 
or pans tnereor of one or several species oy arranging positively hybridized oligonucleotide probes via overlapping 
sequences and by carrying out the hybnuuauon under commons in which these probes hybndize only to ruily 
homologous sequences. 

Because this addiuon includes several changes and clanficatious. for greater clarity the addiuon is wntteo 
m the form of a completely new DESCRIPTION which consists of S (SerboCroatua) pages, (a the new 
description, we did not tnciude the use of piasmid vectors and of amplification as possible methods for obtaining 
replicated fragments of genomic ONA, Competitive hybridization is the only substantially new pan of the emit* 
procedure. Moreover, the PATENT CLAIMS have been modified and now consist of 5 claim*, and the 
ABSTRACT Is new. 

Applicant: 
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Prof. Or. Vladimir GHsin 
Director 



DESCRIPTION 



Cocapared 10 prior-art seuuensme meinoos. our procedure i> basea on an entirety different iocic ana is 
applicable only to me deiertrusauon or sequences or complex OS A fragments and/or molecules imcre man one 
million base paim. It is based on highly specific hvbndiiauon or oligonucleotide probes (ONPs) wun t ie 3 g U or * 
11 to 20 nucleotides. 

Conditions tor ONP hyonuizauon nave oeen rouou which differentiate between complete oomoioey wi tQ 
a target and nonhomoiogy tn a single base (Wallace, R.B.. ei ai.. Nucieu Acids Re*. 6. 3543-3557 (1979)1. When 
the hybridization method w»h 3 M tetrametny t ammonium cnionde u used, the meiung point or the byond depenas 
only on the ONP length, reparole** or the CC comoosition (Wood. W.I, ct ai. Proc. Natl. Acad. Sci. USA 82. 
1585-1588 (I9S5)|. Hence, by hybridization unuer tnese condition*, sequences are determined unanwguousiy. 3v 
hybridization or genomic ONA. replicated in subclones (SO of appropriate length, wuh a sufficient number of 
ONPs and by cotnoutenxed ama cement of the detected sequences, it is possible to sequence the enure genome at 
the same time. We beiieve that this procedure is by an order of rnigninirlo faster and less expensive than the one 
now being developed and that for this reason it is applicable to the sequencing of genomes of ail characteristic 
species. 



For this procedure, it is necessary to opumize the sequence length, the number of ONPs. the number of 
SCs and the length of the pooled ONA that can represent a hybridization spot. 

I I-oxnc ONPs are the soonest ONPs that can currently be successfully hybridized. This wouid a priori 
mean that 4 U . or 4194304. ONPs are necessary to detect each sequence. Too same number of ^dependent 
hybndizauoos would be required for each SC or SC pool (group). Positively hybridizing ONPs would arrange 
themselves over overlapping lO-mers. In this manner, the ONA sequence of the fives SC would be obtained. 

The process of SC trq u mce arrangement is interrupted when the overlapping lO-mcr is repeated in the 
given SC. In this manner, uninterrupted sequences are obtained only between repeated 10-mers or longer 
oligonucleotide sequences (ONSs). These SC sequence fragments (SFs) cannot always be arranged into an 
unambiguous linear array without additional information. For this reason, it is important to determine the probable 
number or SFs (Nsrt for a given ONA length by use of probability calculations. 



ONSs arc uumouieu in * ranuorrjy rormeu ionc ON A sequence accruing to a binomial distnoution. The 
average uisunce 1 A> between luenticai ne:ynoonn« ONSs ucoenus oniy on tae OS'S Icnatn (Ll ^r*o is oouineu iron 
i.ne expression A - u L . Tr.e prooabiiuy thai ihe ONS wiii be repeated N times ta a fragment having a lesgtn or 
Lf base pairs inpi 1$ civen bv in- suuauon 

PfN.LH - ON.Lfl x il/A)" x (1 • WA) U (i) 

wherein QN.LII is the numoer ot eomomauon* of class N of Lf elements. The expected number or different ONSs 
of lenctn L or tne average iii*UAce* a (hat repeat memsaJves N times in ine fragment Lf bp is given by the proauct 
PfN.LH x A. If the arrangement or sequences is done via an overlapping ONS of I en fin L.ur average distance A,, 
then Nsr in ins tracmem Lf is given oy 

Nef m \ - a,x £ N x PfN.LH. N a 2 tf) 

la the event that ail (4x10*) U-mers are used, about 3 SFs art expected ta a 1.5 kb-long Lf. We 
return to the problem of SF arrangement later. 

The number of 4x10* syntheses to obtain ail ll-menc ONPs is unecoaomicai for the practical use of 
sequencing by hybridization (SBrD. Deleting a significant number of ONPs (more than 20%) is not advantageous 
because it leads to unread gaps in the seq ue n ce , A much better method for reducing the numoer of 'irtTTritw 
ONP syntheses and indepenaent hybridizations is by use of arranged ONP groups, la this case, sooner fragments 
must be sequenced, but there are 00 gaps in the sequence. The number of syntheses and hybridizations is reduced 
40-fold, but 7 times more SCs are needed. 

From the standpoint of information, (he use of arranged ONP groups is the same as the use of shorter 
ONPs. For example, there are 65.536 different 6-menc ONSs. Since according to our current knowledge an ONS 
8-mer cannot form a suBTe hybrid, a group of 1 1 -sera. can be used as aa equivalent. Common to all 11 -men in 
the group is one i-mer. so that information is obtained only about its presence or absence in the target ONA. The 
anticipated groups of 1 1-mers each contain 64 ONPs or the type (N2)N8fNl) (the S\V orientation is in the writing 
direction. (Nx) denotes ^unspecified bases and Ny denotes y specific bases*. With about 65.000 such groups, all 
sequences are detected. Based on Eq. 2. we find that aa average of 3 SFs is expected in 200 bp-long ONA 
fragments. Because or variability, some fragments of this length will have 10 or more SFs. 

ONPs or type t N21N6(N 1 ) are not very suitable for sequencing mammalian ON A because of the nonraadooi 
GC and dinucieotide composition of this ONA. The common sequence of the ONP group must be longer if it 
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ONP, . I *. N««> « W- N< ""»"* "—"■«'" 



ia both cases- 



be »-* '"0 ^ ONH „.» » b, ^» ^ „^ ,000 ONP.. 

6i 000 .NDN..N ONP... Thb a, b. — I**-*" ' 



* add.t.cna, 11-000 ONPs or me tNIWUNSfNn type, wner. NS denotes 7000 M13 S-mers and denote, 
=icn ot J nucleotides that is not present next to the ci> en S-mer. 

Our ea.eui>:ions «o uate refer .0 Muucocisg s.ncie-siranued ON A. ?or the sequenong ot doubie-strwoed 
DNA. a not necessary to syntaes.ze both eono.etnen.arv ONP,. The numoer ot required ONPs .. tnus haJveo. 
Seeau»e ot the convenience ot the M 13 sv«em. however. w« w,ll suy w,«h the sequencui« of tutiWruded DNA. 
la this ease, the use of one half of the ONP, m the SCs will lead to gap, of unread sequence* A g,p in a g.ven 
SC. bowever. will be read in the SC contain^ the complementary strand. In a representative random SC library. 
^ seouence « repr«enied on .verm 10 time,. K«c the probab.l.ty es.su that each sequence wll be cloned 
m both senses, namely thai both DNA strands w.ll be read. It .s thus poutble to use only noncomatemenury ONP, 
w.,n an mcreased us. of eomputeniauon. ~ ,s mean, that the total number 0, requtred ONP, would be about 
:0.000. It .< were poss.bl« to con«ruct an M 13 vector out could stmultaneoualy or successively pack both standi, 
the use 01 noncomei.rnenury ONPj would result in no additional reouirement*. 

Ail SCs and/or SC grouptng, (SC poo.sl hybridize with all aauetpated ONPi. la this manner, for eaca 
SC or SC pool w. obuia a set of positively hybndt^g ONPs. Tow ONP* ere arranged in sequoces by 
overlapping over the common sequences, which ar. only one nucleotide shorter thaa the ONP. For faaw detection 
of overlapped ONP, in each symhesxzed ONP. it ts ne«ssary to determine in advance which ONP, show maximum 
overlap w,th it. Taue. each ONP* will have its *m of ONPs (ONPe. ONPb. ONPe. ONP* S' ONP. 3' (ONP.. 
ONPf. ONP,. ONPM. To. arrangement u thus achieved b, deucuai which of the four ONP, wuhi' and which 
of the four ONP, with 3' hybridized positively .0 the g.vea SC or SC pooi. Too arranging coaaaue. una! two 
postuv. overlapped ONP, ar. found for th. last ONP amaged. When aU SF, sr. .steaded to a meaimum. this 
computer-assisted process ends. 

By use of (he five. ONP group., th. number of SF, is tacreasod by the given ONA length. In the general 
case, utuawiguou, arr»««aeat » P0«*»W. for a <naain.ua of 3 SF. per SC. couated by th. method by which Nrf 
w M calculated bv £.,. 2_Two of these ar. recognised as the ternunal one, and the third ta logically tn the middle. 
The arrangement of SF* cannot be revived by . suitable SC length, beau* tt would b. too short. Our solutioa. 
ar* mutual arrangement of SF. and a large number of SCs so that the SC pools, too. caa be used a, a aybndixauoo 
spot, and/or cotapeutive hybridization of labeled and unlabeled ONP,. 

To obtain as eompleu sequences a, possible by SBH. SCs that are to be used later require three SC 
Ubrane, tn the M13 vector w,th insert, of 0.5 kb and 7 kb a, well as differ** tasen. of different sues msde up 
of cwo seouence. which in die genomic DNA are ,ep.ra«d by .bout 100 kb (sktpptaf th. SO. The first library 
serve, pnmaniv for .rrangia C the SF,. Thes. SCs caa also be kept for later experimental use. The SC 



paracissu .n uie nybnciiauon as pools obuineu uuns; p.nacs S row«n by simultaneous infection or arter we g row«„. 
The seconu l.bnry is we baste one. hs SCs w,«„ lhs:r iarssr lKJefu lfe re3re SUIU0 , V llored fef mfWJ . UM - . 
7 Vb i. Bg w w« chosen as the upper limit for the size ot inserts that can successfully be cloned in M 13. The mini 
library serves ,o correctly l.rJc .nto a i.ngie , Muence cam of sequences seoarated by highly homologous sequences 
longer wan 7 kb and by uncioned DNA tragmenti. 

Following hybridization of SCi of ail libraries w,«h all ONPs and after SF eornpuung. the mutual 
arrangement of SFs and SC* <s undertaken. The basic l.brary .s arranged first, The overlapped SCs are detested 
througn the content or the ent-.re starting Sr of the staruag SC or paru thereoi. Suitable for tiuunmais are. u the 
general case, all SF$ with a length ot about 20 b» and longer. The avenge SF length of these SCs wss calculated 
by Eu. 2 and found to be from 2 to 12 bp. This indicates the existence of a sufficient number of SFs ot suitable 
length. Moreover, these SFs. ot which mostly there are two. are known, and one of them follows the starting SF. 
in this case, both sequences are examined, one ot them being the right one. and the overlapped SCs are detected 
vii this sequence. The exact displacement of the overlapping SCs relative to the ttarung SC is determined on the 
basts of the resulting SF content. At the same tine, by deteeuaf all SCs that overlap with the starting SC (he 
SFs of the starting SC are grouped into a linear array of subsets (SSF). Toe SSFs are defined by neighboring 

endings of overlapped SCs (start-start, statvead or end-end). The SC overlapping process r via the SF 

taken from the most protruding SSF of the most protruding SC. The arranging process is interrupted when the 
uacioned pan of the DNA is encountered or. as in SF formation, when a repent sequence longer than 7 kb is 
encountered. This procedure affords maximum-sue groups of arranged, overlapped 7-ktHoog SCs and linearly 
ordered SSFs of their SFs. 



tn this procedure for arranging SFs. the DNA lea tin that includes the SSFs is essential. This length 
depends on the number of SSFs. which is equal to the number of SC endings, namely it is rwiee as large as the 
number of SCs. For e representative library of DNA fragments of one million bp. 700 7-ko-tong SCs are needed. 
This means that the avenge SSF sun is 715 bp. The actual avenge number of SFs wuhissuch an SSF is not even 
one tenth of all SFs of tW enure 7-kb SC The actual number is independent of the SC length, namely it depends 
only on the SSF length? Aesordtng to Eq. 2. for s length of 7 IS bp and an Aa,of 30.000 that the anticipated ONPs 
have, the expected avenge number of SFs with an avenge length of 45 bp is 16. 

The arrangement of SFs within the SSFs obtained is accomplished via a 0.5-kb SC library. In this 
procedure, tt is not essential that these be individual SCs: an SC pool can also be used. The SCs in a pool are 
mformauve if they do not overlap with each other. From an information and technology standpoint, a 10-kb pool 
of cloned DNA is advantageous, although it does not represent a limit. The required number of these SCs or pools 
is tueh that the maximum size of the SSF they form will not be greater than 300 bp. With the proposed ONP. we 
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<uuic:oate wuhin wis ONA ten gut 3 Srs (Eu. :i. *mcn. as aireauv exptamea. ^ oe irrwcetl unamoieuouxiv. 3v 
use of ths binomial distnbutica. * e uenveu the euwuucn 

Nsa - 2N$c il - INsoNbp) 1 -* 

wnerein Ssa u the numeer o? SSFs greater than Uns. Nsc is the number or SCs. Nbp is the number or base pairs 
in the ONA fragment r,r moiccui* being sequenced. Lms is the SSF size that, oa average, gives the arrangement 
number cf 3 Sr and which in (his cite ts 300 bp. On the basis of this equation, it w» determined thai :<.000 
0.5*kb SC» are needed for a uNA Iragment ot one million bp. The numoer of I0*kb poois is 1220. The averaee 
SSF size ooumed fo? this SC. is 20 bp. 

SF arranging is done oy computer-assisted ueu^!»on or poois containing SCs that overlap the suiting SSF 
obuineu by imaging the basic library. The detection is perrormed ca the baste ci the contest or the enure 
randomly selected SF or pan thereof in the tuning SSF. Sated oa the conieat of the other SFs ot the tuning SSF. 
che size of the overlap of 0.5 -kb SCs is determined, and at the same time, because of their high density, the order 
of SFs in the suruag SSF is also determined. At the end of this process, oai obtain* the sequence of each group 
of arranged 7-kb SCs sad aa inrfiranoo as to the pool that the 0«5<fcb SC (has the parucuiar 

sequence. At s certsia small aumber of locations, the vwpirona will aoc be complete or it will b« ambiguous. Our 
r a i c ulart oas show thst this happens oa average at less than oca locsxtoo per million bp, tha randomly distributed 
tinrirtfrrrrt ONSs amounting to 30%. These locauoas are sequenced by ruitable tremtmeat of tha SCs containing 
them and repeated application of SBH or by compeuttve hybridization of ratably seiaeted pairs of uaiabaied and 
labeled ONFs or by tha conventional method or by the advanced convenuonsi i 



The compeuttve hybndizauoa procedure will be explained oa the example of a twice repeated 7-bp 
sequence. In this ease, two SFs terminate and two start with the repeating sequence TTAAaaCG. which is 
underlined: 

rNNNrWNNNNNNCA TTAAAACGl* 
S'NNNNNWNNNNNNCG TTA A a aCCI * 

5TrAAAACCTAgNNMMMNNV 
STTAAAAGGgCCNNNMNNNr 



3y preaybndizauoo with excess unlabeled ONP. for example S7N21GA TTAAAAGfN nr which because 
of a noncomplementsry base cannot hybridize to i'NNC CTTAAAACCr. the subsequent hybridization of oaa of 
the rwo labeled ONPs. i.e.. S^A^Ar^AcrNnr or S'(r«lAj^f22CCC(Nl)3\ is prevented. The pair 
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PATENT CLAIMS 



1. Procedure tor seuucncin^ the entire genomic ON A or urge parts thereof by hybridisation wua 
oligonucieotioe prooes. characterized in that the repucaied fragments of genomic DNA are hybndixec wun some 
or all 8* to 20-nueicoudes-long ONPs resulting from the variation and repetition of the 4 nucleotides A. T or U. 
C. C. or their derivatives and analogs, by wing individual ONPs or a mature of individually synthesized ONPs. 
or of an arranged ONP group synthesized so thai more or all nucleotides or their derivatives or analogs are added 
at a certain point dunng synnesu. Mi that the hyondization reaction is earned out under conditions in wmca the 
oligonucleotide prooes hybridize only wuh a rully homologous sequence or a sequence that has an amount of 
nonhomoiogy that does not cause the formation ot ambiguous or faulty sequences in the process of arranging 
positively hvbndiztng ONPs via a maximum mutual overlap of their sequences. 

2. Procedure according 10 Claim t. characterized in that replicated fragments of genomic ON A are 
obtained by cloning into vectors based on single-stranded bacteriophages in the form of three subclone libraries with 
inserts of 0.1 to 1 kb and 3 to 10 kb and inserts coosuting of two para separated from the genomic DNA by an 
avenge of 50 to 200 kb, that they are replicated as individual subclones andaaSC group* obtained by nmuluneoua 
infection and that they are hybridized on the filter to which they are applied aa a hybridization spot as uninterrupted 
or cut out vector-insert ONAs of individual subclones and groups of subclones up to fragments of aa average length 
of 20 bp. 

3. Procedure according to Claims 1 and 2. characterized in that the sub fragments of the sequence of 
the individual subclones or groups of subclones, obtained by overlapping positively hybridizing ONPs for the given 
subclone or group of subclones, are arranged into a natural linear array by cyclic detection of overlapping subclones 
based on the content of subfngments of the sequence of the starting subclone or group of subclones* which 
subclones in a library of 0.1 to t kb show an average displacement of less than 100 kb. 

4. .Procedure according to Claims 1 and 2. characterized in that the subfragments of the sequence of 
the individual subclone or group of subclones, obtained by overlapping positively hybridizing ONPs for the given 
subclone or group of subclones, are arranged into a natural linear array by the procedure of competitive 
hybridization with unlabeled and labeled oligonucleotide probes whereby first the filter hybridizes with a saturating 
amount of unlabeled oligonucleotide probe, which contains all or part of the terminal, repeating oltgonudeotide 
sequences in the sequence subfragment for which it is desired to determine the following sequence subfragment and 
then, with or without previous covalent bonding of this cold probe to the filter, separate hybridizations *re earned 
out with labeled oUgoeucleottdes with probes containing all or part of the repeating oligonucleotide sequence, so 
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i. . ......a »flu«iee which ii contained in the unlabeled probe. and 

.mi at least » oar. is eommon to that pan ot the repeating sequence wmeo 

u - .-.ii-^ ... ,.«uu<( leaueaee. from each sequence »ub fragment 
the renvur.u« 01 the nonrepeating ^uencei that tollow me repeating «q«w -1 

. ._.,,.„^,„ ..-uMca tuhfnsacni it determined as that wnose labeled 
that eontatns the repeating sequence, and the following sequence suotragmw 

oligonucleotide probe does oot nybndiae. 

5 Pn ^«r«a«ort«gtoCUinuia.char^ 
ONA ,s earned out wuh tndtvidual hybhdixauon spot, contatntng I3S0 group, of 20. or a. aver.,, o, 20. group, 
of M13 subclone, 0.5 kb in .ength. 700 MU ,-bclone, 7 kb « .engthand 170 M» subclone, wrach stop 

on average ,00 kb of the ger.o«»c ON A. and by hybnonauoo of escb spot with UK4 group, of 16 prooe, eacbof 
the <A.T.C.G*t0,A.T.C.a type wh-eut NI0 are al, the lO-mcn ,nu do o« con^ the O aod C nucleotide, 
.,th 2UM0 group, of .6 prooe, each of the ,A.T.C.QN9f A.T.C.Q type where N9 .r. aU ™ — 
. K««i4.«u«of64orooe,e^of<h.lA.T.CCXA.T.C.GW8(A.T.C.C)iyp. 
one or two C -G nucleotides, with 55.834 group, of oe prooei ««. « 

<*XC.GKAXCO i N.,A.T.CO,<A.T.C.G> — — ™« * - «-» 

C+G^eoude^ 

/ «m«i lengths shorter Una 18 bp «xd coaasaaf of t to 7 Gucieoud«-4oo« 
ail moootonic sequences of the require* le&gtns snorw w» r 

repealing uuu. 



Applicant 

(Signed:! 

Prof. Dr. Vliaunir GlaiA* 
Dixooor 



ABSTRACT 



The cocQiuoos under wnieh oligonucleotide prooes hyondue oaiy w uh fully homologous sequences are 
icaown. 3y men hybridization and by imaging potiuveiy hybridising probes v»i overlapping parts, the sequence 
of che given ONA fragment is read. By simultaneous hybridization of DNA molecule* of the smgit-sinaded phage 
vector<ioned insert, applied in the form of spots, with about 50.000 to 100,000 groups of probes the main type of 
which is (A.T.CGXA.T.CQN8(A,T.C.G). information for computer-assisted detmmauoo of ONA sequences 
of the complexity of the mammalian genome caa be obtained. To obtain as complete sequences as possible, in/ee 
libraries in the vector based oa the M 13 phage are used: those with 0.5 kb insens. those with 7 kb insms and those 
with inserts consisting of two sequences separated m the genomic ONA by an avenge of 100 kb. For one million 
bp of genomic ONA are needed 22.000 0.5-kb subclones. 700 7-kb subclones and 170 skipping subclones. The 
0.5 kb suoclones are applied to the Alter in groups of 20. so thai the total cumber of samples is 2120 per million 
bp. The procedure caa be readily and completely robotized for reading complex g enomi c DNA fragments or 
molecules in a manufacninng plant. 



